Citation Proximity Analysis (CPA) – A new approach for identifying related work based on Co-Citation Analysis
نویسندگان
چکیده
This paper presents an approach for identifying similar documents that can be used to assist scientists in finding related work. The approach called Citation Proximity Analysis (CPA) is a further development of co-citation analysis, but in addition, considers the proximity of citations to each other within an article‟s full-text. The underlying idea is that the closer citations are to each other, the more likely it is that they are related. In comparison to existing approaches, such as bibliographic coupling, co-citation analysis or keyword based approaches the advantages of CPA are a higher precision and the possibility to identify related sections within documents. Moreover, CPA allows a more precise automatic document classification. CPA is used as the primary approach to analyse the similarity and to classify the 1.2 million publications contained in the research paper recommender system Scienstein.org. Introduction and Motivation The search for related scientific work can be tedious, and often important documents are missed out. Difficulties are caused by an increasing number of publications, growing exponentially at a yearly rate of 3.7 %, unclear nomenclature, synonyms and numerous other factors [1]. In practice, most searches for related work start with some initial papers and navigating the citation web nearest to those papers. However, even the more advanced approaches for identifying related work based on co-word analysis, collaborative filtering, Subject-Action-Object (SAO) structures or citation analysis do often not deliver satisfying results [2-8]. Therefore, we developed a new approach to determine the similarity of documents, which we name Citation Proximity Analysis (CPA). The approach is based on cocitation analysis and improves precision by considering the position of citations. The presented approach was developed for the research paper recommender Scienstein 1 to assist researchers in finding related work. The first part of this paper gives an overview about existing methods to identify similar documents, whereas the focus lies on the most popular citation analysis approaches and their strengths and weaknesses. The second part explains how the CPA can be used to measure similarity and the steps necessary to calculate a new metric that we call Citation Proximity Index (CPI). Afterwards, first results from an empirical study comparing the performance of co-citation analysis and CPA are presented. Finally, an outlook on further implications and how the CPA could be used in other fields is given. 1 www.scienstein.org is a research paper recommender focusing on identifying related work developed by the authors Related Work Various approaches exist to determine the degree of similarity of documents in order to identify related work. Whereas text-mining approaches are used in cases in which references are not stated, citation analysis approaches usually deliver superior results as e.g. synonyms and unclear nomenclature do not lead to misleading results [3, 4, 5]. Many citation analysis approaches exist and they all have their own strengths and weaknesses for identifying similar documents. Among the most widely used are the easily applicable „cited by‟ approach, which considers papers as relevant that cite the same input document and the „reference list‟ approach, which considers papers as relevant that were referenced by the input document. The best results can usually be obtained by bibliographic coupling and co-citation analysis, which allow calculating the coupling strength [6]. These approaches, which were already invented in the 60s and 70s, are used by scientists and on academic search engine websites like CiteSeer 2
منابع مشابه
Identifying Related Work and Plagiarism by Citation Analysis
This updated and revised paper gives an overview of my PhD research. It focuses on two newly developed approaches. Citation Proximity Analysis (CPA) allows the identification of related work by analyzing the co-occurrence of citations within documents. In contrast to co-citation analysis various factors, such as the proximity of citations to each other, are taken into account. The second approa...
متن کاملCan we do better than Co-Citations? - Bringing Citation Proximity Analysis from idea to practice in research article recommendation
In this paper, we build on the idea of Citation Proximity Analysis (CPA), originally introduced in [1], by developing a step by step scalable approach for building CPA-based recommender systems. As part of this approach, we introduce three new proximity functions, extending the basic assumption of co-citation analysis (stating that the more often two articles are co-cited in a document, the mor...
متن کاملIdentifying Related Documents For Research Paper Recommender By CPA and COA
This work-in-progress paper introduces two new approaches called Citation Proximity Analysis (CPA) and Citation Order Analysis (COA). They can be applied to identify related documents for the purpose of research paper recommender systems. CPA is a variant of co-citation analysis that additionally considers the proximity of citations to each other within an article’s full-text. The underlying id...
متن کاملThe analysis of co-citation and word co-occurrence networks of Iranian articles in the field of dentistry
Background and Aims: Dentistry is an important profession ensuring the health of body and soul, and has a special place in the scientific productions of medical disciplines. The purpose of this study was to analyze the co-citation and word co-occurrence of Iranian research papers in the field of dentistry based on indexed documents in Web of Science from 2014 to 2018. Materials and Methods:...
متن کاملDrawing Co-Citation Networks of Corona Virus Studies
Background and Aim: The purpose of the present study is to map the coronavirus domain citation network to better understand this domain based on all other citation networks. Materials and Methods: The present study is applied in terms of purpose, and is descriptive scientometrics in terms of type, which has been done with the all-citation method. In this study, all scientific publications on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009